Finite automata for compact representation of tuple dictionaries

نویسندگان

  • Jan Daciuk
  • Gertjan van Noord
چکیده

A generalization of the dictionary data structure is described, called tuple dictionary. A tuple dictionary represents the mapping of n-tuples of strings to some value. This data structure is motivated by practical applications in speech and language processing, in which very large instances of tuple dictionaries are used to represent language models. A technique for compact representation of tuple dictionaries is presented. The technique can be seen as an application and extension of perfect hashing by means of finite-state automata. Preliminary practical experiments indicate that the technique yields considerable and important space savings of up to 90% in practice.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Finite-State Library for NLP

A library of functions is described which use finite-state automata for compact storage and efficient usage of very large dictionaries and language models. The library can be used to test whether a word is in a dictionary, to perform morphological analysis, to construct perfect hash tables, and to construct and use very large language models (such as models which employ bigram and trigram frequ...

متن کامل

A Time-Efficient Token Representation for Parsers

One of the most important functions of linguistic tools is to apply grammars to texts in order to find matching sequences. These grammars are often represented by finite-state automata. The expressions described in these grammars are usually lexical units, but some systems offer the possibility of dealing with references to sets of words which require the use of electronic dictionaries (ex. ...

متن کامل

Novel LVCSR Decoder Based on Perfect Hash Automata and Tuple Structures – SPREAD –

The paper presents the novel design of a one-pass large vocabulary continuous-speech recognition decoder engine, named SPREAD. The decoder is based on a time-synchronous beam-search approach, including statically expanded cross-word triphone contexts. An approach using efficient tuple structures is proposed for the construction of the complete search-network. The foremost benefits are the impor...

متن کامل

A new view on fuzzy automata normed linear structure spaces

In this paper, the concept of fuzzy automata normed linear structure spaces is introduced and suitable examples are provided. ;The ;concepts of fuzzy automata $alpha$-open sphere, fuzzy automata $mathscr{N}$-locally compact spaces, fuzzy automata $mathscr{N}$-Hausdorff spaces are also discussed. Some properties related with to fuzzy automata normed linear structure spaces and fuzzy automata $ma...

متن کامل

Compiling Apertium morphological dictionaries with HFST and using them in HFST applications

In this paper we aim to improve interoperability and re-usability of the morphological dictionaries of Apertium machine translation system by formulating a generic finite-state compilation formula that is implemented in HFST finite-state system to compile Apertium dictionaries into general purpose finite-state automata. We demonstrate the use of the resulting automaton in FST-based spell-checki...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • Theor. Comput. Sci.

دوره 313  شماره 

صفحات  -

تاریخ انتشار 2004